Counselor Availability at CPS High Schools

CAPP 30239 Final Static Portfolio
Aya Liu
Feb 9, 2020

Set up

In [0]:
#@title
%%capture
!apt install libspatialindex-dev
!pip install rtree
!pip install geopandas
!git clone https://github.com/aya-liu/map-cps-counselors.git
In [0]:
#@title
import os
os.chdir('map-cps-counselors')
import pandas as pd
import numpy as np
import rtree
import geopandas as gpd
import altair as alt
from preprocess_data import *
pd.options.display.max_columns = 999
pd.options.display.max_rows=999
In [0]:
#@title
# load preprocessed data
areas = gpd.read_file('data/preprocessed/areas.geojson')
school_counsel = pd.read_csv('data/preprocessed/school_to_counselors.csv')
school_counsel = school_counsel[school_counsel['Primary_Category']=='HS']
ratings_to_quality = {'Level 1+': 'High Quality', 
                      'Level 1': 'High Quality',
                      'Level 2+': 'Medium Quality',
                      'Level 2': 'Low Quality',
                      'Level 3': 'Low Quality',
                      'Inability to Rate': 'N/A'}
school_counsel['Quality'] = school_counsel['Overall_Rating'].replace(ratings_to_quality)

school_counsel['has_counsel'] = np.where(school_counsel['num_std_per_csl'] == np.inf,
                                         'No Counselor',
                                         'Has Counselor')
sorted_quality = ['High Quality', 'Medium Quality', 'Low Quality']

Theming

In [220]:
#@title
def my_theme():
  font = 'Futura'
  main_palette = ["#98b3b5",
                  "#d77a66",
                  "#59483d",
                  "#bab987"]
        
  return {'config': {'background': '#f7f5f5', 
                     'padding': {'left': 20, 
                                 'top': 20, 
                                 'right': 20, 
                                 'bottom': 20},
                     'view': {'height': 400, 
                              'width': 800, 
                              'strokeWidth': 0}, 
                     'title': {'anchor': 'start', 
                               'fontSize': 20, 
                               'font': font, 
                               'subtitleFont': font},
                     'axis': {'titleFont': font, 
                              'labelFont': font,
                              'titleFontSize': 14, 
                              'labelFontSize': 14, 
                              'titlePadding': 20, 
                              'labelPadding': 5, 
                              'labelLimit': 500}, 
                     'legend': {'titleFont': font, 
                                'labelFont': font,
                                'titleFontSize': 14, 
                                'labelFontSize': 14, 
                                'labelLimit': 500, 
                                'padding': 10, 
                                'strokeColor': '#d3d3d3', 
                                'fillColor': '#ffffff'},
                     'header': {'titleFont': font, 
                                'labelFont': font,
                                'titleFontSize': 14, 
                                'labelFontSize': 14}, 
                     'range': {'category': main_palette}
                     }
          } 

alt.themes.register('my_theme', my_theme)
alt.themes.enable('my_theme')
Out[220]:
ThemeRegistry.enable('my_theme')

Charts

Introduction

School counselors provide both academic and personal support for high school students. Since public high schools are a primary resource from which most adolescents can seek support, the availability and quality of school counselors can make a huge difference in the students' wellbeing and academic performance. This is especially true for students who are experiencing high levels of economic and psychological distress, since they are not likely to receive stable support from other channels.

Let's take a look at what public school counselor availability is like in Chicago, especially across different school quality levels.

School Quality is coded from the CPS SQRP Ratings (Level 1+ to 3) as follows:

  • High: Level 1+, Level 1
  • Medium: Level 2+
  • Low: Level 2, Level 3

High and medium quality schools are considered to have good standings; low quality schools receive external support.

In [221]:
#@title
d = school_counsel.groupby(['Quality', 'has_counsel']).size().to_frame()
d['pct'] = d.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
d = d.reset_index()
d.columns = ['Quality', 'has_counsel', 'cnt', 'pct']

# Chart
title={"text": "Almost Half of CPS High Schools Have No Counselors, Regardless of Quality",
       "subtitle": ["Breakdown of Schools by Counselor Availability and Quality (2019)",
                    ""],
       # "subtitleColor": "grey",
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }

pct_bars = alt.Chart(d).mark_bar().encode(
    x=alt.X('pct:Q', 
            title='Percentage of Schools (%)'
    ),
    y=alt.Y('Quality:N',
              sort=sorted_quality,
              title='School Quality'),
    color=alt.Color('has_counsel:N',
                    title='',
                    sort=['No Counselor', 'Has Counselor'])
).properties(height=250, width=400
)

text1 = alt.Chart(d).mark_text(dx=-25, 
                              dy=1, 
                              color='white',
                              font='Futura', 
                              fontSize=15,).encode(
    x=alt.X('pct:Q', stack='zero'),
    y=alt.Y('Quality:N'),
    detail='has_counsel:N',
    text=alt.Text('pct:Q', format='.1f')
)

d2 = school_counsel.Quality.value_counts().to_frame().reset_index()
d2.columns = ['Quality', 'cnt']
cnt_bars = alt.Chart(d2).mark_bar(color='#909090').encode(
    x=alt.X('cnt:Q', 
            title='Number of Schools'
    ),
    y = alt.Y('Quality:N',
              axis=None)
).properties(height=250, width=250
)

# text2 = alt.Chart(d2).mark_text(dx=16, 
#                               dy=1, 
#                               color='#909090',
#                               font='Futura', 
#                               fontSize=15,).encode(
#     x=alt.X('cnt:Q'),
#     y=alt.Y('Quality:N',
#             axis=None),
#     text=alt.Text('cnt:Q')
# )


chart = ((pct_bars + text1) | cnt_bars).configure_legend(orient='top'
).configure_view(width=600)
chart.title = title
chart
Out[221]:

Half of high- and medium-quality schools and a third of low-quality high schools in CPS have no counselor at all. This suggests a huge growth area for CPS schools.

In [222]:
#@title
# Histogram
d = school_counsel.groupby(
    ['Quality', 'num_counsel_FT']).size()
pcts = d.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
pcts = pcts.to_frame().reset_index()
pcts.columns = ['Quality', 'num', 'perc']

title={"text": "Most Low-Quality Schools with Counselors Only Have One",
       "subtitle": ["Normalized Distribution of Counselor Count Per School at Each Quality Level (2019)",
                    ""],
       # "subtitleColor": "grey",
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }

hist = alt.Chart(pcts).mark_area(interpolate='natural').encode(
    x=alt.X('num:Q', 
            title='Number of Full-Time School Counselors at One School'
    ),
    y=alt.Y('perc:Q',
            title='Schools (%)'),
    color=alt.Color('Quality:N',
                    scale=alt.Scale(domain=sorted_quality),
                    title='School Quality',
    )
).transform_filter(alt.datum.Quality != 'N/A'
).properties(height=150, width=700
).facet(row=alt.Row('Quality:N',
                    sort=sorted_quality, 
                    title=''
                    ),
        spacing=10
).properties(
    title=title
)
hist.configure_legend(orient='top')
Out[222]:

From the previous chart, we saw that low-quality schools actually have a higher chance of having at least one counselor than higher-quality schools. However, when we only focus on schools with counselors, we see the majority of low-quality schools have only one counselor. On the other hand, more of high- and medium-quality schools have multiple counselors, and higher quality schools have more spread-out distributions of counselor counts.

In [0]:
#@title
annotations = {
    1880: "Rise of steamship travel",
    1924: "Immigration Act of 1924 \nestablishes country quotas",
    1965: "Hart-Celler Act of 1965 \nabolishes country quotas"
}
annot_df = pd.DataFrame.from_dict(annotations, orient='index').reset_index()
annot_df.columns = ['Year', 'Annotation']
In [224]:
#@title
# make distribution graph
title={"text": "A Wide Spread of Student-Counselor Ratio Exists at Every Quality Level",
       "subtitle": ["Distribution of Student-Counselor Ratio By School Quality, Excluding Those with No Counselors (2019)",
                    "*Pink line indicates the median student-counselor ratio of the quality level",
                    ""],
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }
distr = alt.Chart(school_counsel).mark_tick(size=120,
                                            thickness=3,
                                            opacity=.8).encode(
    x=alt.X('num_std_per_csl:Q', 
            title='Student-Counselor Ratio (# Students Per Counselor)'
),
    y=alt.Y('Quality:N', 
            sort=sorted_quality, 
            title='School Quality',
            axis={'labelAngle': -90
            }),
    color=alt.Color('Quality:N', 
                    scale=alt.Scale(domain=sorted_quality),
                    title='School Quality')
).transform_filter(alt.datum.Quality != 'N/A'
).properties(width=700, height=400, title=title)

# make line for overall median
median = alt.Chart(school_counsel).mark_tick(color='#FF3366', 
                                             size=120,
                                             thickness=3).encode(
    x=alt.X('median(num_std_per_csl):Q', axis=None),
    y=alt.Y('Quality:O', 
            sort=sorted_quality, 
            title='School Quality')
).transform_filter(alt.datum.Quality != 'N/A'
).properties(width=700, height=400)

# make median text
text = alt.Chart(school_counsel).mark_text(
    align='left',
    baseline='bottom',
    color='#FF3366',
    font='Futura',
    size=14,
    dx=3,
    dy=3
).encode(
    x=alt.X('median(num_std_per_csl):Q', axis=None),
    y=alt.Y('Quality:O', 
            sort=sorted_quality, 
            title='School Quality'),
    text=alt.Text('median(num_std_per_csl):Q',format=',.0f')
).transform_filter(alt.datum.Quality != 'N/A')

(distr + median + text).configure_legend(orient='top')
Out[224]:

For now, let's focus on schools with at least one counselor. Are students in lower quality schools more likely to have worse counselor availability, because most low-quality schools tend to only have one counselor?

A closer look tells us no. In contrast, low-quality schools have a slightly lower median student-counselor ratio (222:1) -- a higher counselor availability for students -- than schools with good standings. This is likely because low-quality schools also tend to have smaller student counts.

However, the more important lesson is that there is a hugely wide spread of student-counselor ratio at each quality level, ranging from around 50:1 to above 400:1, indicating large inequalities in access to the resources school counselors have to offer.

In [225]:
#@title
title={"text": ["Schools Offering Counselors Have Higher Average and Lower Variance",
                "in College Enrollment Rates than Their Counterparts"],
       "subtitle": ["Distribution of Student-Counselor Ratio By School Quality, Excluding Those with No Counselors (2019)",
                    ""],
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }

box = alt.Chart(school_counsel).mark_boxplot(size=50).encode(
    x=alt.X('has_counsel:N',axis=alt.Axis(title='',
                                          labelAngle=0)),
    y=alt.Y('College_Enrollment_Rate_School:Q',
            axis=alt.Axis(title='Average College Enrollment Rate')),
    color=alt.Color('has_counsel:N', title='')
).transform_filter(alt.datum.Quality != 'N/A'
).properties(width=200  
).facet(column=alt.Column('Quality', title=None), 
        spacing=10)
box.title = title
box.configure_legend(orient='top'
    ).configure_view(width=600)
Out[225]:

We have looked at how are counselors distributed across schools at differernt quality levels. But does having access to counselors really make a difference for the students? We see that the average college enrollment rates at schools with counselors are higher and have less variation than those at schools without counselors. Their difference is larger for high- and medium-quality schools (around 10%), and significantly smaller for low-quality schools (less than 5%). This is likely the adverse effect of other factors, such as quality of teachers and campus safety, can easily overcome the positive effect from counselors at low-quality schools.

In [226]:
#@title
main_palette = ["#98b3b5",
                  "#d77a66",
                  "#59483d"]

def plot_college_on_ratio(quality, color, y_axis=True):
    if y_axis:
      y_axis_title = 'College Enrollment Rate (School Avearge)'
    else:
      y_axis_title = ''

    x_axis_title = ['Student-Counselor Ratio', 
                    '@ {} Schools'.format(quality)]

    points = alt.Chart(school_counsel).mark_point(opacity=0.8).encode(
        alt.X('num_std_per_csl:Q',
              axis=alt.Axis(title=x_axis_title),
              scale=alt.Scale(domain=[0, 600])
        ),
        alt.Y('College_Enrollment_Rate_School:Q', 
              axis=alt.Axis(title=y_axis_title),
              scale=alt.Scale(domain=[0, 100])
        ),
        color=alt.condition(alt.datum.Quality == quality,
                            alt.value(color),
                            alt.value(color_excl)
        )
    ).transform_filter(alt.datum.has_counsel =='Has Counselor'
    ).properties(width=200)
    
    line = points.transform_filter(alt.datum.Quality == quality
          ).transform_regression(on='num_std_per_csl', 
                                 extent=[100, 500],
                                  regression='College_Enrollment_Rate_School',
                                  method="poly",
                                  order=3                       
          ).mark_line().encode(
              color=alt.value(color))
    return alt.layer(points, line).resolve_scale(color='independent')

def plot_college_on_salary(quality, color, y_axis=True):
    if y_axis:
      y_axis_title = 'College Enrollment Rate (School Average)'
    else:
      y_axis_title = ''

    x_axis_title = ['Annual Salary of Full-Time',
                    'Counselors (School Average)', 
                    '@ {} Schools'.format(quality)]

    points = alt.Chart(school_counsel).mark_point(opacity=0.8).encode(
        alt.X('FTE Annual Salary:Q',
            axis=alt.Axis(title=x_axis_title),
            scale=alt.Scale(domain=[50000, 110000])
        ),
        alt.Y('College_Enrollment_Rate_School:Q', 
              axis=alt.Axis(title=y_axis_title),
              scale=alt.Scale(domain=[0, 100])
        ),
        color=alt.condition(alt.datum.Quality == quality,
                            alt.value(color),
                            alt.value(color_excl)
        )
    ).transform_filter(alt.datum.has_counsel == 'Has Counselor'
    ).properties(width=200)
    
    line = points.transform_filter(alt.datum.Quality == quality
          ).transform_regression(on='FTE Annual Salary', 
                                #  extent=[100, 500],
                                 regression='College_Enrollment_Rate_School',
                                 method="linear"                
          ).mark_line().encode(
              color=alt.value(color))
    return alt.layer(points, line).resolve_scale(color='independent')

ratio_title = {"text": ["It's Ambiguous How Student-Counselor Ratio Relates to College Enrollment"],
       "subtitle": ["Student-Counselor Ratio and Average College Enrollment Rate, at Each Quality Level (2019)",
                    "*Colored points are schools at the given quality, gray points are the rest of CPS high schools",
                    ""],
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }
college_on_ratio = alt.hconcat(plot_college_on_ratio('High Quality', main_palette[0]), 
            plot_college_on_ratio('Medium Quality', main_palette[1], False), 
            plot_college_on_ratio('Low Quality', main_palette[2], False)
            )
college_on_ratio.title = ratio_title


salary_title = {"text": ["At Low-Quality Schools, Higher Counselor Salary is Correlated with Higher",
                         "College Enrollment Rate"],
       "subtitle": ["Average Annual Salary of Full-Time Counselors and Average College Enrollment Rate, at Each Quality Level (2019)",
                    "*Colored points are schools at the given quality, gray points are the rest of CPS high schools",
                    ""],
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }
college_on_salary = alt.hconcat(plot_college_on_salary('High Quality', main_palette[0]), 
            plot_college_on_salary('Medium Quality', main_palette[1], False), 
            plot_college_on_salary('Low Quality', main_palette[2], False)
            )
college_on_salary.title = salary_title

alt.vconcat(college_on_ratio,
            college_on_salary,
            spacing=50)
Out[226]:

In general, it is ambiguous how student-counselor ratios are correlated with students' college enrollment rates at CPS high schools. At low-quality schools, higher counselor availability is weakly correlated with higher college enrollment rates. However, this relationship doesn't exist in high- or medium-quality schools.

Unlike the student-counselor ratio, higher counselor salary is strongly correlated with higher college enrollment rate at low-quality schools. One possible explanation is that higher salaries attract higher-quality counselors, who make a larger difference in students' academic outcomes.

However, this relationship is not present in high- and medium-quality schools. In fact, there is a strong negative corerlation between those variables at high-quality schools.

In [0]:
#@title
%%capture
# convert school and counselor data into a geodataframe
school_counsel = school_counsel.dropna(axis=0, subset=['geometry'])

# add school location into school data
school_loc = gpd.read_file('data/CPS_School Locations SY1819.geojson')
school_loc['school_id'] = school_loc['school_id'].astype('int')
school_counsel.drop(columns='geometry', inplace=True)
school_counsel = school_counsel.merge(school_loc[['school_id', 'geometry']], 
                                      how='left', 
                                      left_on='School_ID', 
                                      right_on='school_id')
school_counsel = school_counsel.drop(columns='school_id')
school_counsel = gpd.GeoDataFrame(school_counsel, geometry='geometry')

area_school = gpd.sjoin(areas, school_counsel)
d = area_school.groupby(['name', 'has_counsel']).size()
access = d.groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
access = access.to_frame().reset_index()
access = access[access['has_counsel'] =='Has Counselor'].drop(columns='has_counsel')
access.columns = ['name', 'has_counsel_pct']

d = area_school.groupby(['Quality', 'name']).agg({'num_std_per_csl': 'mean'})
d = d.loc['Low Quality',:]
d.columns = ['ratio_at_lq']

area_ratio = areas.merge(d, how='left', left_on='name', right_index=True)
area_ratio = area_ratio.merge(access, how='left', on='name')
area_ratio['has_counsel_pct'].fillna(0, inplace=True)

area_ratio = area_ratio[area_ratio['slug'] != 'ohare']
In [390]:
#@title 
econ_map = alt.Chart(areas).mark_geoshape(stroke='white').encode(
    color=alt.Color('HARDSHIP INDEX:Q',
                    scale=alt.Scale(scheme='goldred'),
                    legend=alt.Legend(title='Economic Hardship Index',
                                      titleFontSize =12,
                                      orient='right')
                    )
    ).properties(
        width=600)
lq = school_counsel[(school_counsel.Quality == 'Low Quality') &
                    (school_counsel['FTE Annual Salary'] > 80000)]
school_points = alt.Chart(lq).mark_geoshape(opacity=0.9,
                                            color=main_palette[2]
                                            ).properties()
chart = (econ_map + school_points).resolve_scale(color='independent'
        ).configure_view(height=600)
title = {"text": ["Low-Quality Schools with Counselor Salary Over 80k are Spreaded Throughout",
                  "Poor Neighborhoods"],
       "subtitle": ["Location of CPS High Schools with Average Counselor Salary Over $80,000 and Economic Hardship Index",
                    "at Community Area Level (2019)",
                    ""],
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }
chart.title = title
chart
Out[390]:

Since low-quality schools with higher counselor salaries tend to have higher college enrollment rates, let's take a look at where they are. They are even distributed throughout the poor neighborhoods, which is good since they potentially improve the residents' opportunities for social mobility.

In [384]:
#@title
title = {"text": ["Poor Neighborhoods in Chicago Have Less Counseling Support for Students,",
                  "Except for Those on the Southwest Side"],
       "subtitle": ["Economic Hardship Index and Percentage of CPS High Schools with Counselors, at Community Area Level (2019)",
                    ""],
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }

# make background choropleth map of economic hardship index
econ_map = alt.Chart(areas).mark_geoshape(stroke='white').encode(
    color=alt.Color('HARDSHIP INDEX:Q',
                    scale=alt.Scale(scheme='goldred'),
                    legend=alt.Legend(title='Economic Hardship Index',
                                      titleFontSize =12,
                                      orient='top')
                    )
    ).properties(
        width=400)
    
access_map = alt.Chart(area_ratio).mark_geoshape(stroke='white').encode(
    color=alt.Color('has_counsel_pct:Q', 
                    legend=alt.Legend(title='% Schools with Counselors',
                                      titleFontSize=12,
                                      orient='top'),
                    scale=alt.Scale(scheme='tealblues'),
                    )).properties(
        width=400)
chart = (econ_map | access_map).resolve_scale(color='independent'
)
chart.title = title
chart
Out[384]:

The poor neighborhoods on the South Side and West Side of Chicago has much lower access to schools with counselors compared to the rest of the city. The only neighborhood with hard economic conditions but a high percentage to schools with counselors is the Southwest Side.

In [0]:
#@title
%%capture
area_ratio['centroidx'] = area_ratio['geometry'].centroid.x
area_ratio['centroidy'] = area_ratio['geometry'].centroid.y

sph_map = alt.Chart(area_ratio).mark_geoshape(stroke='white').encode(
    color=alt.Color('single-parent-households:Q',
                    scale=alt.Scale(scheme='redpurple'),
                    legend=alt.Legend(title='% Single Parent Households')
                    )
    ).properties(width=600)
    
size_access = alt.Chart(area_ratio).mark_circle(size=80,
                                                opacity=0.8,
                                                color='#4aa2bd',
                                                stroke='white',
                                                strokeWidth=1
                                                ).encode(
    longitude='centroidx:Q',
    latitude='centroidy:Q',
    size=alt.Size('has_counsel_pct:Q', 
                  title='% Schools with Counselors'),
).properties(width=600)
In [365]:
#@title
title =  {"text": ["Neighborhoods with Higher Percentage of Single Parent Households Have",
                    "Lower Concentration of Schools Offering Counselors"],
       "subtitle": ["Percentage of Single Parent Households and CPS High Schools with Counselors, at Community Areas Level (2019)",
                    ""],
       "subtitleFontSize": 16,
       "subtitleFont": "Futura",
       }
chart = sph_map + size_access
chart.title = title
chart.configure_legend(orient='right',
                       titleFontSize=12
                       ).configure_view(height=600)
Out[365]:

Adolescents in single parent households tend to experience higher economic and psychological distress, and are in higher need of counselors' support. However, Chicago neighborhoods with a higher percentage of single parent households have a lower percentage of schools with counselors, indicating a gap between need and support provided.